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Abstract. We present a software package that guesses formulae for sequences 
of, for example, rational numbers or rational functions, given the first few 
terms. We implement an algorithm due to Bernhard Beckermann and George 
Labahn, together with some enhancements to render our package efficient. 
Thus we extend and complement Christian Krattenthaler's program Rate.m, 
the parts concerned with guessing of Bruno Salvy and Paul Zimmermann's 
GFUN, the univariate case of Manuel Kauers' Guess. m and Manuel Kauers' and 
Christoph Koutschan's qGeneratingFunctions .m. 

1. Introduction 

For some a brain-teaser, for others one step in proving their next theorem: given 
the first few terms of a sequence of, say, integers, what is the next term, what is 
the general formula? Of course, no unique solution exists, but, by Occam's razor, 
we will prefer a 'simple' formula over a more 'complicated' one. In this article we 
present a new package that aims at finding such a simple formula, written for the 
computer algebra system FriCAsQ 

Some sequences are very easy to 'guess', like 

(1) 0,1,4,9,..., 
or 

(2) 1,1,2,3,5,.... 
Others are a little harder, for example 

(3) 0,1,3,9,33,.... 

Of course, at times we might want to guess a formula for a sequence of polynomials 
or rational functions, too: 

(4) l,l + q + q\ {l+q + q^){l + q^), (1 + <z')(l + q + q^ + q"" + q^) . . . , 
or 

(5) 1 - 2g, (1 - q){l - 2q)\ (1 - qf{l - 2q){l - 2q - 2q^)\ . . . 

1 - g 

Fortunately, with the right tool, it is a matter of a moment to figure out formu- 
lae for all of these sequences. In this article we describe a computer program 
that encompasses well known techniques and adds new ideas that we hope to 
be effective. In particular, we generalise both Christian Krattenthaler's program 
Rate .m [20] . and the guessing functions present in GFUN written by Bruno Salvy and 
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Paul Zimmermann i25j . and in qGeneratingFunctions .m by Manuel Kauers and 
Christoph Koutschan [16] [17] . With a little manual aid, we can guess multivari- 
ate formulae as well, along the lines of Doron Zeilberger's programs GuessRat and 
GuessHolo j35]|36], or Manuel Kauers' program Guess. m [15]. All these programs, 
as well as the one presented here, try to compute a function that yields the terms 
when evaluated at 0, 1, 2, and so on. We describe this computational approach in 
more detail beginning in Section [5] 

A completely different idea is pursued by The online encyclopedia of integer 
sequences of Neil Sloane |28| . There, you can enter a sequence of integers and 
chances are good that the website will respond with one or more likely matches. 
However, the approach taken is quite different from ours: the encyclopedia keeps a 
list of currently roughly 160, 000 sequences, entered more or less manually, and it 
compares the given sequence with each one of those. Besides that, there is an email 
service called SuperSeeker that tries some transformations on the given sequence 
to find a match in the database. Furthermore it tries some programs in the spirit 
of Rate.m and GFUN to find a formula, although with a time limit, i.e., it gives up 
when too much time has elapsed. 

Thus, the two approaches complement each other: For example, there are se- 
quences where no simple formula is likely to exist, and which can thus be found 
only in the encyclopedia. On the other hand, there are many sequences that have 
not yet found their way into the encyclopedia, but can be guessed in easily by your 
computer. 

In Section [3] we outline the capabilities of our package. In Section |4] we describe 
the most important options that modify the behaviour of the functions. A very 
brief description of the algorithms used and the efficiency problems encountered is 
given in Section [5] and thereafter. 

2. History 

On the historical side, we remark that already in 1964, Malcolm Pivar and Mark 
Finkelstein j23j implemented a program to identify sequences given their first few 
terms, see also Paul W. Abrahams [T|. One interesting feature of their program 
was the ability to deal with exceptions to a rule: their program would apply, for 
example, the differencing operator, until most of the terms would be equal. In 
a second step, it would then locate the exceptions to the rule and try to guess 
formulae for the positions and for the values of the exceptions. 

The first edition [27] of 'A Handbook of Integer Sequences' by Neil Sloane ap- 
peared in 1973. In 1992, Frangois Bergeron and Simon Plouffe [5] explored the 
idea of applying various transformations to the given sequence, for example series 
reversion. They then used Pade approximation to see whether the result might be 
rational. In the same article an experimental program to check for 'constructible 
differentially finite' series is also briefly described, but it seems that it was not so 
successful. 

In Physics, Richard Brak, G. S. Joyce, Michael E. Fisher, Helen Au-Yang, An- 
thony Guttmann [H] [13] developed methods using algebraic and hypergeometric or 
holonomic functions to fit series data starting from the early seventies. Of course, 
they named their techniques differently, and, more importantly, they were primar- 
ily interested in estimating 'critical exponents' and 'critical points' of the function 
whose first few Taylor coefficients are given. 
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3. Some Function Classes Suitable for Guessing 

In this section we briefly present the function classes which are currently explic- 
itly covered by our package. We want to stress however, that in many cases it is 
easy to add other function classes, should the need arise. (This will become clear 
in Section m) 

Throughout this article, n n- fin) is the function we would like to guess, and 
f{x) = X]n>o •/'(")^" generating function. The values f{n) are supposed to 

be elements of some field K, usually the field of rationals or rational functions. We 
alert the reader that the first value in the given sequence always corresponds to the 
value /(O). 

3.1. Guessing sequences /(n). 

guessRec: finds recurrences of the form 

(6) p(/(n),/(n + l),...,/(n + fc)) =0, 

where p is a polynomial with coefficients in K[rt]. For example, 
guessRec [1,1,0,1,- 1,2,- 1,5,- 4,29,- 13,854,- 685] 
yields 

[fin) : -/(n + 2) - f{n + 1) + f{n)' = 0, /(O) = 1, /(I) = 1]. 

Note that, at least in the current implementation, we do not exclude 
solutions that do not determine the function / completely. For example, 
given a list containing only zeros and ones, one result will be 

[/(n) = 0,/(0) = ...]. 

guessPRec: only looks for recurrences with linear p, i.e., it recognises P- 
recursive sequences. As an example, 

guessPRec [0, 1, 0, -1/6, 0, 1/120, 0, -1/5040, 0, 1/362880, 
0, -1/39916800, 0, 1/6227020800] 

returns 

[/(„) : (_„2 _ _ 2)j(„ + 2) - fin) = 0, /(O) = 0, /(I) = 1]. 

guessRat: finds rational functions. For the sequence given in ([T]), we find 

as likely solution. 
guessExpRat: finds rational functions with an Abelian term, i.e., 

/(n) = (a + H"4^ 
s{n) 

where r and s are polynomials. 
guessExpRat [0,3,32,375,5184] 
yields 

n(n + 2)", 

which could be interpreted, for example, as the number of labelled trees 
with one edge selected. 
guessBinRat: finds rational functions with a binomial term, i.e., 

fa + bn\ r{n) 

fin) = ]-rT 
\ n J s[n) 

where r and s are polynomials. 
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Concerning g-analogues, guessRec(q) finds recurrences of tlie form (jG]), wlrere 
p is a polynomial with coefficients in K[q, g"]. Similarly, we provide g-analogues for 
guessPRec and guessRat. For example, to guess a formula for Sequence (U), we 
entei0 

guessRat (q) ( [1 , l+q+q~2, (l+q+q'2) * (l+q'2) , (l+q~2) *(l+q+q~2+q~3+q'4)] , 
[]) 

and obtain as function 

Unfortunately the simplifying capabilities of FriCAS are rather weak, so it takes 
some extra work to simplify the above expression to 

{l-q){l-q') ' 

i.e., the g-binomial coefficient ["+^]^ := where [n]q := = l + q + 

Moreover, it is also possible to guess 'mixed' recurrences, i.e., where p has coeffi- 
cients in K.[q, n, g"], see the description of the option maxMixedDegree in Scction|4l 
For example, 

guessPRec (q) ( [1 , 1 ,2*q~2 , 6*q"6 ,24*q'12, 120*q'20 ,720*q'30 ,5040*q'42] , 
maxMixedDegree==2 , homogeneous==true) 

returns 

[fin) : in + l)/(n + l)^^" _ /(„ + 1) = 0, /(O) = 1]. 
The g- version of guessExpRat recognises functions of the form 

fin) = ia + bq-r^y 

a and b being in K[g] and r and s polynomials with coefficients in ]K[g]. For Se- 
quence ([5]), we enter 

guessExpRat(q) ([(l-2*q)/(l-q) ,l-2*q, (1-q) * (l-2*q) "3 , 
(l-q)-2*(l-2*q)*(l-2*q-2*q-2)-3] , [] ) 

to obtain 

^^^(2g" -3g + l)". 

9-1 

Another example would be Nicholas Loehr's g-analogue [n + l]g^^ of Cayley's for- 
mula. 

Finally, guessBinRat (q) tries to fit the given terms to 



fin) 

where ["] = n" i ■ 



a + bn 
n 



Kg") 



Because of a flaw in FriCAS, one has to explicitly specify a list of options when using the q- 
versions of the guessing functions. In the example above, we simply gave an empty list of options, 
and did thus not override any of the default options. 
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3.2. Guessing Series f{x). 

guessADE: finds an algebraic differential equation for f{x), i.e., an equation 
of the form 

(7) =0, 

where p is a polynomial with coefficients in K[a;]. A typical example is 

guessADE [1,1,2,9/2, 32/3 , 625/24 , 324/5 , 117649/720 , 131072/315 , 
4782969/4480] 

returns 

[[x"]f{x) : -xf'ix) + f{xf ~ fixf = 0, /(O) = 1, /'(O) = 1]. 

Maybe more interesting, we obtain also a differential equation for the 
exponential generating function with coefficients of the form covered by 
guessExpRat: 

guessADEC [(a*n+b)~n/factorial(n) for n in 0..32], 

maxPower==3 , maxDerivative==3 , homogeneous==true) 

However, this equation is already quite big: 

462(a + bff{xff"{x) + 3ab^{a + b)xj{xf f" {x) - 4b\2a + bfj{x)f'{xf 

- a(a3 + aH + l<dab^ + 3b^)xf{x)f'{x)f"{x) - 3a^{a ~ 3b)f{x)f'{x)f"'{x) 

+ 5a^{a - ib)x^f{x)f"{xf + Aa'^xf'{xf = 0. 

We stress that we did not try to prove this equation - it remains a guess, 
even though we checked the first few hundred terms. 

Another interesting example is given by the generating function for 
the chromatic polynomials of rooted triangulations, as found by William 
Tutte [31]. Or, as a test case, to guess a differential equation for Jacobi's 

2 

^-function 1 + 2 a list of the first 3600 terms, 

guessADEd, maxPower==14, maxDerivative==3 , maxDegree==6) 

and a little patience (roughly ten minutes on an AMD Opteron processor) 
suffice. In fact, according to Don Zagier [Ml Section 5.1, Proposition 15] 
already Ramanujan knew that every modular and every quasi- modular form 
on Fi satisfies a third order algebraic differential equation. 
guessHolo: only looks for equations of the form ([T]) with linear p, that is, it 
recognises holonomic or differentially-finite functions. It is well known that 
the class of holonomic functions coincides with the class of functions having 
P-recursive Taylor coefficients. However, the number of terms necessary 
to find the differential equation often differs greatly from the number of 
terms necessary to find the recurrence. Returning to the example given for 
guessPRec, we find that already 

guessHolo [0,1,0,-1/6,0,1/120] 

returns 

W]f{x) : -f"{x) - f{x) = 0, /(O) = 0, /'(O) = 1]. 

Moreover, now we immediately recognise the coefficients as being those of 
the sine function. 
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guessAlg: looks for an algebraic equation satisfied by f{x), i.e., an equation 
of the form 

p{f{x)) = 0, 

the prime example being given by the Catalan numbers 
guessAlg [1,1,2,5,14,42] 
which yields 

[[x"]f{x) : xfixf - fix) + 1 = 0, /(O) = 1]. 

guessPade: recognises rational generating functions or, equivalently, recur- 
rences with constant coefficients. For the Fibonacci sequence given in ([2]), 
we find as likely solution 

iWVix) : {x^+x-l)f{x) + l = Q]. 

guessFE: finds 'Mahler- type' functional equations for f{x) (see for example 
[21]), i.e., equations of the form 

(8) p{f{x)J{x^),...J{x''))=0, 

where p is a polynomial with coefficients in K[a;]. A typical example is the 
number of unlabelled rooted binary trees: 

guessFE [0,1,1,1,2,3,6,11,23] 

which returns 

[[x'']f{x) : f{x^) + f{xf ~ 2f{x) + 2x = 0, f{x) ^ x + x^ + x"^ + 2x* + 0(a;^)]. 

Browsing the online encyclopedia of integer sequences, we discovered an- 
other rather surprising functional equation: consider the sequence Al 18006 
of binary words defined by wi — "01" andw„+i = concat[wn, Wn, reverse{w. 
Then 

guessFE w 4 

indicates that the limiting word Woo satisfies 

{x - l){x^ -x + l)ix^ +X + l){{x^ +X+1) (/(.x) - x^f{x^)) + x{x? + \f = 0. 

Again, we did not try to prove this equation but only checked the first few 
hundred terms. 

For guessADE and guessHolo we provide g-analogues, replacing differentiation 
with g-dilation: guessADE (q) finds differential equations of the form 

(9) =0, 

where p is a polynomial with coefficients in K[(7,a;]. Generating functions satisfy- 
ing such g-equations frequently occur in the enumeration of polyominoes and the 
study of orthogonal polynomials. As an example, we can recover the g-algebraic 
differential equation for the generating function of bar polyominoes by horizontal 
perimeter - marked by x, vertical perimeter - marked by y and area - marked by 
g, as given by Richard Brak and Thomas Prellberg [24]. We enter 

guessADE(q) (1 , maxDerivative==l , maxPower==2 , maxDegree==l) 

where 1 are the first eleven coefficients of the series in x: 

1 := [0, q*y/(l-q*y), q"2*y*(l+q*y)/(l-q*y)/(l-q"2*y) , ...] 
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The solver immediately finds the solution 

: iqxf{x) + {qx + l)v)f{qx) + [qx - l)f(x) + qxy = 0, 

/(0) = 0,/'(0)= '^^ 



1 ■ • ■ J 1 



1 - 9?/ 

it then takes a few seconds to verify it. 

3.3. Operators. The observation made by Christian Krattenthaler before writ- 
ing his program Rate.m |20| is the following: it occurs frequently that although 
a sequence of numbers is not generated by a rational function, the sequence of 
successive quotients is. 

We slightly extend upon this idea, and apply recursively one or both of the two 
following operators: 

guessSum - A„: the differencing operator, transforming f{n) into f (n) — f {n— 

!)• 

guessProduct - (5„: the operator that transforms f{n) into f{n)/f{n — 1). 

For example, to guess a formula for Sequence we enter 

guess([0, 1, 3, 9, 33], [guessRat] , [guessSum, guessProduct]). 

The second argument to guess indicates which of the functions of the previous sec- 
tion to apply to each of the generated sequence, while the third argument indicates 
which operators to use to generate new sequences. 
The package will then respond with 

n—l s—1 

En(^+2), 

s=0 p=0 

i.e., the sum of the first factorials. 

In the case where only the operator Qn is applied, our package is directly com- 
parable to Rate.m. In this case the standard example is the number of alternating 
sign matrices 

guess [1, 1, 2, 7, 42, 429, 7436, 218348] 
which yields 

"-r^ ^y^' 27P + 5AI + 24 ^ n 3(3/ + 2)(3^ + 4) 
16/2 32/ H- 12 ^ J--!- 4(2/-|-l)(2/-|-3)' 

fc=0 i=0 fc=0 1=0 ^ ' 

3.4. Closure properties and zero test. Part of what makes a class of functions 
interesting arc its closure properties, summarised in the table below for some classes 
of functions. Apart from the theoretical point of view, it is also good to know that 
the computer can guess an equation for /(n) if it can do so for j(n) + 1. 

However, one has to keep in mind that even simple transformations may increase 
the number of terms necessary to successfully guess an equation dramatically. For 
example, consider the (exponential) generating function for the Bell numbers i?„, 
counting the number of partitions of {1, 2, ... , n}, which is 



B[x) = Y,B^ 



n>0 
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This series is not holonomic, but it satisfies the simple algebraic differential equation 
B"B — {B'Y' — B'B = 0, and the first thirteen terms suffice to find it. By contrast, 
it takes 36 terms to guess the shifted series {e^ ~^ — l)/x. 

In the same spirit, note that without specifying the search space any further, 
already six terms are enough to guess a functional equation for the number of 
unlabelled rooted binary trees. On the other hand, we need at least 42 terms to 
guess an equation for the square of their generating function. 

This phenomenon also explains why Christian Krattenthaler's program Rate.m 
is so useful: of course there is also an algebraic recurrence for the number of alter- 
nating sign matrices, namely 

(-1671^ - 32n - 12)/(n)/(n + 2) + (2771^ + 54n + 24)/(n + 1)^ = 0, 

but we need 35 terms to guess it instead of eight. (Instead of looking for a 
formula having k nested products, we could also use the options Somos==true , 
maxShif t==k, homogeneous==2~ (k-1) , but this only works well for k less than 4.) 
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Table 1. closure properties. ((.)~^: multiplicative inverse, o: com- 
position, {.Y~^^: compositional inverse, /: definite integration, 
0: Hadamard product, S: shift, alg.: algebraic substitution; in- 
verse and substitution only apply when the result is again a formal 
power series. 



Proofs for the closure properties of rational, algebraic and linear differential 
equations can be found, for example, in Richard Stanley's book Enumerative Com- 
binatorics 2 |30] or his article on differentiably finite series [29]. For algebraic dif- 
ferential equations, proofs were given by Alexander Ostrowski in [55], see also [19] . 
Slightly weaker closure properties hold for the q case. In particular, g-holonomic 
series are only closed under the substitution x i-^ x'^jk gN, see for example [TS]. 

Algebraic recurrence relations seem to satisfy no interesting closure properties: 
for example, take any sequence that does not satisfy an algebraic recurrence rela- 
tion, and write it as the sum of two sequences, one with odd terms zero, the other 
with even terms zero. Both summands are solutions of f{n)f{n + 1) = 0. However, 
a related class, so called 'admissible recurrences' was studied by Manuel Kauers [14] 
and has been shown to enjoy many closure properties. 

Similarly, we are not aware of any results concerning closure properties of Mahler- 
type functional equations as defined in our paper. However, linear equations 
p (^f{x), f{x^), f{x^^) . . . , f{x^'')^ = for fixed r were shown by Phillippe Du- 
mas [11] to give rise to nice closure properties. This extends to the non-linear 
case. 

One very important property of these classes is the availability of a zero-test, i.e., 
an algorithm that will decide whether any given equation (together with sufficiently 
many initial values) has only the zero solution. For linear differential equations 
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this is folklore, for algebraic differential equations an algorithm was proposed for 
example by Joris van der Hoeven |32j . In many cases, such a test allows to verify 
conjectured identities automatically, as exercised for example by GFUN. 



4. Options 

To give you the maximum flexibility in guessing a formula for your favourite se- 
quence, we provide options that modify the behaviour of the functions as described 
in Section |3l The options are appended, separated by commas, to the guessing 
function in the form option==value. See below for some examples. 

maxDerivative, maxShift: specify the maximum derivative in an algebraic 
differential equation, or, in a recurrence relation, the maximum shift. Set- 
ting the option to arbitrary specifies that the maximum derivative - the 
maximum shift ~ may be arbitrary, which is the default. 

maxPower: specifies the maximum total degree in an algebraic differential 
equation or recurrence: for example, the degree of [f'Yf is 4. Setting 
the option to arbitrary specifies that the maximum total degree may be 
arbitrary, which is the default. 

homogeneous: specifies whether the search space should be restricted to ho- 
mogeneous algebraic differential equations or homogeneous recurrences, i.e., 
the case where the polynomial p in Equation © and Equation ([7]) is homo- 
geneous. By default, it is set to false. Setting it to a positive integer, only 
homogeneous polynomials p of this degree are tried. Setting it to true, all 
homogeneous polynomials p up to total degree maxPower are tried. 

Somos: specifies whether the search space should be restricted to algebraic 
differential equations where the sum of differentials is constant. Similarly, 
when guessing recurrences, Somos insists that the sum of shifts is con- 
stant. By default it is set to false. Setting it to a positive integer, the 
sum of differentials or shifts must be equal to this number. Setting it to 
true is equivalent to invoking the guesser with Somos==2, Somos==3, . . . , 
Somos==d, where d is the specified maxDerivative (or maxShift) times 
maxPower or homogeneous. 

maxDegree: specifies the maximum degree of the coefficient polynomials in an 
algebraic differential equation, a Mahler-type functional equation or a re- 
currence with polynomial coefficients. For rational functions with an expo- 
nential term, maxDegree bounds the degree of the denominator polynomial. 
The default value of maxDegree is arbitrary. 

allDegrees: specifies whether all possibilities of the degree vector - tak- 
ing into account maxDegree - should be tried. The default is true for 
guessPade and guessRat and false for all other functions. 

maxMixedDegree: allows guessing of mixed g-recurrences. Its value deter- 
mines the maximum degree of in the coefficients, default being zero. 

maxLevel: specifies how many levels of recursion are tried when applying op- 
erators as described in Section 13.31 Note that, applying either of the two 
operators results in a sequence which is by one shorter than the original se- 
quence. Therefore, in case both guessSum and guessProduct are specified, 
the number of times a guessing algorithm from the given list of functions 
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is applied is roughly 2", where n is the number of terms in the given se- 
quence. Thus, especially when the list of terms is long, it is important to 
set maxLevel to a low value. 

Still, the default value is arbitrary, which means that the number of 
levels is only restricted by the number of terms given in the sequence, 
safety: specifies, as explained in detail in Section[5]and Section|6]the number 
of additional equations a solution has to satisfy. The default setting is 1. 

Experiments indicate that, the larger the class of functions covered, the 
larger one should set safety. Moreover, when the sequence contains many 
zeros, higher settings of safety are appropriate. For all algorithms we rec- 
ommend to set safety higher than the number of trailing zeros. The reason 
is best illustrated by an example: 

guessPade ( [a,b, c ,0] ) 

returns 

[[a;"]ca;^ +bx + a]. 

In other words, if the sequence has a trailing zero, guessPade trivially finds 
a solution. A few experiments and a moment's thought will reveal that the 
other algorithms behave similarly. 

check: determines whether we want to check the solutions returned by the 
modular solver using a deterministic check, or whether we content ourselves 
with a (rather weak) Monte-Carlo type check, or skip checking entirely, the 
default value being deterministic. 

checkExtraValues: specifies whether we want to return only those solutions 
that fit the given data perfectly. With checkExtraValues==f alse, the 
complete basis of the solution space is returned, see Section [6] The default 
value is true. 

one: specifies whether the guessing function should return as soon as at least 
one solution is found. By default, this option is set to true. 

indexNcime, variableName, f unctionName: specify symbols to be used for 
the output. The defaults are n, x and f respectively. 

debug: specifies whether information about progress should be reported. 

5. Rational Interpolation 

The underlying idea of all guessing software is to fit the given data to a model. For 
example, a formula for Sequence ([T]), is almost trivial to guess: it seems obvious that 
it is . A natural model to check is that the sequence in question is generated by a 
polynomial ~ we simply apply polynomial interpolation. Given a list of four terms 
- 0, 1, 4, 9 in our example - we should expect that we need a polynomial of degree 
three to interpolate. Since the actual degree is lower, that is, the interpolating 
polynomial is overdetermined by the data, it is reasonable to accept as a good 
guess. 

Generalising to Hermite-Pade interpolation, we can cover most models described 
in Section Ism 

Rational Interpolation Problem, Sequence Variant. Letf = [/(^^(n), . . . , /('"^ 
be a vector of (truncated) sequences over some integral domain, and n = [n^^\ . . . , n'"' 
a vector of non-negative integers, serving as degree bounds. Let a >0. Determine 
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a polynomial vector p = [p''^\n), . . . ,p(™)(n)] with degp(')(ri) < n^'-* such that 

(10) p^i) (n) • /(I) (n) + ■ • • + (n) • /(™) (n) = /or < n < o". 

Note that, by equating coefficients, this problems can be reduced to solving an 
appropriate linear system of equations with n^^^ + • • • + rS™"^ — 1 unknowns, namely 
the coefficients of the polynomials p^^^n), . . . ,p(™^(n), up to normalisation. Thus, 
we will in fact determine a basis of the space of solutions. However, instead of 
using, for example, naive Gaussian elimination, we will take advantage of the special 
structure of these linear systems to achieve better performance. To illustrate, we 
would like to be able to solve systems where n'^^^ + • • • + n^™) is as large as ten 
thousand. 

Setting cr = n^^'> + n^^) - 1 and f = [(1, 1, . . . , 1), (/o, /i, • • ■ , U-i)] we would 
recover ordinary rational interpolation. However, to have more confidence in the 
'guessed' formula, we use a — n'^^' + ■ ■ • + n^'"^ — 1 + safety instead. 

More generally, to guess algebraic recurrences we consider the (infinite) sequence 
of monomials in the 'variables' /(n), f{n + 1), f{n + 2), . . . 

i 

where A = (Ai, A2, ■ . . ) runs over the integer partitions in lexicographic order: 

1, /(n), /(n)2, f{n + 1), f{n)^ f{n)f{n + l)J{n + 2), /(n)^ f{n)'f{n + 1), . . . 

Then, for each m > 2 we solve the rational interpolation problem with f given 
by the first m entries of this sequence, and n such that the number of unknowns 
_)_..._(_ — 1 in the corresponding linear system plus the specified value of 
safety equals the number of equations a. 

In the formulation of the rational interpolation problem above, the sequence of 
evaluation points was chosen as 0, 1, 2, ... , but it is straightforward to generalise to 
arbitrary evaluation points. Doing so, we can also find g-recurrences, by pretending 
that / is given at the points g^, g^, . . . instead. 

To deal with the models described in Section [3?2l we need to solve another variant 
of the rational interpolation problem: 

Rational Interpolation Problem, Series Variant. Lett = [f^^\x),...J^"'^x)] 
be a vector of ( truncated) power series over some integral domain, and n = [n^^^ . . . , n*- 
a vector of non-negative integers, serving as degree bounds. Let a>Q. Determine 
a polynomial vector p = [p'^^^x), . . . ,p^"^\x)] with degp*^'-* (a;) < n*-'' such that 

(11) ord(p • f) = ord (p^^Hx) ■ f^^\x) + ■■■+ p^"''> (x) ■ /'"'(x)) > a. 

In this case, setting a — n'^^ + n^^^ — 1 and f = [l,/(a;)], where f{x) is the 
truncated power series with the given values as Taylor coefficients, we recover Pade 
approximation. This allows us to 'guess' sequences that are Taylor coefficients of 
rational generating functions. 

To guess algebraic differential equations, we consider the sequence of monomials 
(jJi^^'^^ f{x)^ , where D is the differentiation operator and A = (Ai,A2,...) 
runs over the integer partitions in lexicographic order as before: 

1, f{^), f{x)\f'{x),f{xf,f{x)f'{x), f"{x),f{x)\ f{xff\x), . . . 
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To guess g-algebraic differential equations, we just replace the usual differentia- 
tion operator with g-dilation: D f{x) :— f{qx). Finally, guessFE uses the sequence 
of monomials f{x^^))^. 

For the present package, we originally implemented a fraction free algorithm 
proposed in 2000 by Bernhard Beckermann and George Labahn [4], which at the 
time proved much faster than what GFUN had. However, during the refereeing 
process it became clear that a modular approach would be even more efficient. 
This was first pointed out by Manuel Kauers, and independently by Alin Bostan 
and Bruno Salvy in private communication. Consequently, we decided to follow 
this approach and implemented a modular version of an older algorithm from 1994, 
also by Bernhard Beckermann and George Labahn [2], when the coefficients are 
rational numbers or rational functions with integer coefficients. This turned out to 
be very fruitful, although quite labour-some. For other coefficient domains we still 
use the fraction free algorithm, although we plan to extend the modular approach 
to allow algebraic numbers as coefficients as soon as possible. 

We would like to stress that meanwhile most of the packages mentioned in the in- 
troduction use modular techniques, however using other algorithms for solving over 
a prime field. According to Bruno Salvy, GFUN now uses an algorithm introduced 
in 1997, again by Bernhard Beckermann and George Labahn [3]. Manuel Kauers 
package Guess .m uses the solver provided by Mathematica, it is thus unclear which 
algorithm is used. 

Still, our package outperforms the other freely available packages, for many con- 
figurations of degree bounds and size of the vector f , (see Section [10]) , as well as - 
for univariate sequences - the range of formulae that can be guessed. 

We also implemented specialised algorithms to test whether the n"' term of the 
sequence is given by a formula of the form 

, , , , .„r(n) fa + bn\r(n) 

s[n) \ n J s[n) 

for some a and b and polynomials r and s. Unfortunately, we could not avoid 
solving non-linear equations in this case. Even after exploiting some surprising 
coincidences that reduce the size of the arising equations the performance of this 
algorithm is disappointing: already eight or nine terms, i.e., degree two in r and s 
pose a challenge, even over a finite field 

6. Safety 

How can we 'know' that a formula discovered via interpolation is appropriate? 
At first glance, the answer is quite simple: we use all but the last few terms of the 
sequence to derive the formula. After this, the last terms are compared with the 
values predicted by the polynomial. If they coincide, we can be confident that the 
guessed formula is correct. 

In the case of the rational interpolation problem we get the same set of ac- 
cepted solutions when we use all values, but keep lower degree bounds. We use 
this approach as it is more efficient than actually computing 'bad' solutions and 
rejecting them later, although there is a subtle interaction with an extra check that 
we perform. 



Meanwhile, it seems that we have found a suitable approach, but due to time constraints we 
cannot describe it in this article. 
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Very recently, Alin Bostan and Manuel Kauers [6l Section 2.4] described in some 
detail various other possibilities of checking whether a guessed formula is likely to 
be 'correct', the method we just outlined being clearly the most practical. Unfor- 
tunately, it turns out that this method is problematic in certain situations. In this 
section we explain why. 

First of all, we cannot expect that all elements of the solution space of the rational 
interpolation problem 'interpolate' the given data in the following sense: consider 
the truncated power series f{x) = l+x^+0{x''), and let f — [1, f{x), f'{x)]. Setting 
the vector of degree bounds n — [2,2,2] and a — 6 (note that we 'loose' one term 
because of differentiation, so we have 6 equations in our linear system), rational 
interpolation yields the basis [(1, —1, 0), (0, 0, x)]. Thus, the general solution to the 
rational interpolation problem with the given constraints is 

(a + Px){l-f{x))+^xf'{x)=0, 

a, /? and 7 being elements from the coefficient field. 

Apparently, none of the two basis vectors actually interpolates all given values: 
1 — / (x) = —x^+0{x'^), and xf'{x) = 6x^+0(0;''). One might be tempted to simply 
discard non-interpolating basis vectors (which we do when checkExtraValues is 
true), but doing so we risk loosing 'good' solutions, too: 

(67 + px) (1 - / {x)) + ^xj'[x) = Oix') 

interpolates just fine for any /3 and 7. In particular, the set of interpolating solutions 
is not a vector space. 

An uncomfortable consequence of the above is as follows: we provide an option 
maxDegree that allows the user to specify the maximum degree of the coefficient 
polynomials, see Section ID When set to some integer value d, we (essentially) do 
not compute solution spaces of configurations f with {d+ I) |f| being less than the 
number of values provided. Suppose now that we find an interpolating solution 
without setting maxDegree, and that the maximal coefficient degree of this solution 
happens to be d. Then it may be the case that setting maxDegree==d instead yields 
no result, because all basis elements are discarded. Similarly, one might expect 
that increasing both safety and the number of values by one does not yield more 
solutions. But at lower safety our check may reject all basis elements, while at 
higher safety the basis may contain an interpolating solution. 

A possible way to resolve this dilemma might be to reject solution spaces that 
are not one-dimensional. However, when pursuing this idea, another difficulty sur- 
faces: namely, it is not completely trivial to decide whether two solutions are really 
different. For example, consider f = [I, /(x), /'(x)], and suppose that f{x) is in 
fact a polynomial p{x). Then the interpolation routine will not only find the so- 
lution [p(a;),l,0], but also [p'(a;), 0, 1]. More generally, it is well known that one 
often needs more coefficients to determine the minimal order equation than to find 
a solution of higher order. Thus, if we have enough values to guess the minimal 
order equation then the problem is easy. But otherwise we will either find multiples 
of the minimal equation, or some parasitic solutions. 

This problem can be remedied, at least for linear and also algebraic differential 
equations: in the linear case, we could simply compute a greatest common right 
divisor of the given equations, whereas in the algebraic case we could apply Ritt 
elimination. 
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Still, there is again some danger that 'good' solutions are lost: for example, if a 
sequence is non-zero only at very few indices ni, n2, . . . , n^, then the interpolation 
algorithm will not only find the 'good' solution, but also (n — ni)(n — 712) . . . (ri — 
nfe)/(n) = 0, and the greatest common right divisor of the two will be trivial. 

We admit that so far we were unable to find a completely satisfying solution to 
this problem. In the meantime, we provide options (in particular checkExtraValues, 
and one, see Section [4]) that let the user decide. 

7. Rational Reconstruction 

As already mentioned in Section[5l our solver uses a modular technique: instead 
of solving the rational interpolation problem over the integers, we solve the prob- 
lem over several machine size prime fields and use Chinese remaindering to obtain 
integer solutions. In the same spirit, given coefficients that are rational functions, 
we evaluate them at several random points, solve the simpler problems and use 
rational reconstruction to obtain polynomial solutions. 

There are two different ways in which this plan can fail: it may happen that 
the solution of the problem in the prime field is not an image of the solution of 
the problem in the original ring. In Corollary 19.61 we will see that there are ways 
to discover such 'bad reductions', provided we have at least one 'good reduction'. 
However, we cannot a priori exclude the possibility that all reductions are bad. 

Moreover, it may occur that due to an unfortunate choice of evaluation points 
we obtain wrong solutions - usually, when we have too few evaluation points we 
get no solution, but it may happen that we construct one that is actually wrong. 

Therefore, the solution returned from the core solver is only probably correct, 
and we need to check it before returning it. Thus, the main loop of the solver in 
pseudocode is: 
repeat 

sol := do_solve (data, inner_call? == false) 
if check (sol) then return sol 

where do_solve produces a probably correct solution and check verifies correctness. 
So far we did not encounter a case where the check failed - the solver is designed 
in such a way that probability of wrong answer is very low. Therefore, instead of 
looping, we print an error message and fail. 

do_solve is a Brown-style '9 routine similar to Subroutine M and Subroutine P 
in the gcd algorithm of Mark van Hoeij and Michael Monagan [33] , which in pseu- 
docode looks as follows: 
do_solve (data) == 

if R = Z_p then return solve_over_Z_p(data) 

bad_count := 

good_count := 

sol := empty 

repeat 

modulus := choose_modulus () 
if inner_call? then 

new_data := eval(data, modulus) 

else 

new_data := reduce (data, modulus) 
new_sol := do_solve (new_data, inner_call? == true) 
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if new_sol = "no_solution" then return "no_solution" 
reduction_status := check_reduction(new_sol) 
if new_sol = "failed" or reduction_status = "bad" then 
bad_count := bad_count + 1 

if inner_call? and bad_count > good_count + 2 then 
return "failed" 

else 

good_count := good_count + 1 

if reduction_status = "all_bad" then sol := emptyO 
sol := chinese_remainder(sol, new_sol, modulus) 
rr := rational_reconstruction(sol) 
if not rr = "failed" then return rr 

In contrast to Mark van Hoeij and Michael Monagan we present this algorithm as 
a single routine, to stress that the processing is generic: in the outer level, when 
inner_call? is false, choose_modulus chooses machine size primes, in the inner 
level it chooses random evaluation points. 

The routine check_reduction applies CoroUarv 19.61 to new_sol - we keep the 
necessary information about previously obtained solutions in a global variable, 
namely the minimal dimension of the solution space and the minimal values of 
the critical indices. (Of course, when there are no previous solutions, new_sol is 
automatically treated as 'good'.) Also, we discard solutions with leading exponent 
being smaller than the leading exponent of previous solutions - this is necessary 
to ensure correct normalisation after rational reconstruction and also avoids the 
problem of 'bad content', see [33^. 

The counters bad_count and good_count are used to detect cases where we 
encounter bad reduction already at the outer level - we copied the method used in 
[33] . Without this test the solver would spend a lot of time reconstructing solutions 
which would then be discarded at outer level. 

The variable sol contains homomorphic images of the bases of the solution spaces 
constructed in the current stage, and rational_reconstruction(sol) tries to find 
the basis in the original ring. In the following, we indicate which tricks we decided 
to implement to make the procedure efficient enough. Namely, for polynomials 
we use naive quadratic multiplication, gcd and Chinese remaindering (Lagrange 
interpolation) routines. Also our rational reconstruction implementation uses a 
simple quadratic method. However, we save time by trying rational reconstruction 
not in every step but only after an interval: for polynomials we use a quadratically 
growing sequence of points, while for integers we switch to a big step (currently 
100) after passing a threshold (currently 200). 

More precisely, to reconstruct a solution in characteristic zero given modular 
images toi, TO2, . . . , r7i„ and moduli pi, p2, . . . , p„ we need to compute M such that 
M — rrii mod pi, and then apply rational reconstruction to M . However, instead of 
computing M directly we first compute intermediate solutions Mj such that Mj = 
rrii mod pi for i G {lOOj, . . . , lOOj + 99}, updating incrementally. Whenever we 
have finished such a block of 100 primes and computed Mj, we update M such that 
M = Mj mod Pj where Pj = pioojPiooj+i ■•■Piooi+99j and then apply rational 
reconstruction to M. This scheme makes more efficient use of bignum routines: we 
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perform most operations on relatively small bignmns, and fewer operations on big 
bignums. 

There is one more improvement to rational reconstruction which first appeared in 
NTL [26] (see also Section 3.1 of the description of the IML [10 , which is where we 
learned from the trick): when reconstructing the vector of rational coefficients, we 
incrementally compute the common denominator of the coefficients already recon- 
structed, and impose it on subsequent terms. Since we are looking for an integer (or 
polynomial) solution and fractions appear only due to normalisation, it is natural to 
expect all terms to have the same or very similar denominators. Often, multiplying 
the next term by the common denominator computed so far, it turns out that the 
product is already acceptable as rational reconstruction of this coefficient. 

8. Implementation aspects 

In this section we give some details of our implementation. More precisely, we ex- 
plain some choices we made when implementing the subroutines solve_over_Z_p, 
eval_or_reduce and check mentioned in the algorithm in Section [70 

8.1. solving over Zp. As already mentioned, our solver over Zp closely follows |2]. 
It returns a matrix of polynomials, every column constitutes one solution of the 
Hermite-Pade interpolation problem. Solutions, and also the residuals that occur 
within the algorithm, are packed in vectors of machine size integers. This is possible, 
since we perform the same operations on each component polynomial of the solution. 
However, it lowers control and memory management overhead. 

Computations are performed in place - otherwise memory management would 
dominate the run time. As basic operation we use 'multiply and add', that is we 
compute vi + cv2 for two vectors vi and V2 and a scalar c, and assign the result to 
vi. Compared to separate addition and multiplication with a scalar, this approach 
halves the cost of remainder computations needed for modular arithmetic and also 
halves the loop overhead. 

Currently the 'multiply and add routine' is written in Spad (FriCAS' high level 
implementation language) and via Common Lisp compiled to machine codeQ On 
64-bit machines 32-bit times 32-bit multiplication and 64-bit by 32-bit remainder 
are compiled directly to machine instructions. On a 64-bit Core 2 this leads to 
about 20 clocks per multiply and add step - it seems that this is the same as 
the cost of the machine instruction to compute a remainder. On 32-bit machines 
our compilation scheme performs operations involving 64-bit numbers (either as a 
result or as an argument) via calls to bignum routines, which causes much higher 
execution times. 

In principle we could speed up the solver over Zp replacing remainder operations 
by multiplications. Moreover, the 'multiply and add' routine is quite small so it 
would make sense to replace it by an assembler routine. We estimate that this would 
make our inner loop run 5-10 times faster. However, after some initial work our 
solver over Zp turned out to be the fastest part of the package, so we concentrated 
on removing bottlenecks in other parts. Also, despite the quadratic complexity 

^The actual code can be found in the files modhpsol . spad .pamphlet and 
mantepse . spad. pamphlet in the FriCAS distribution. 

^FriCAS can compile code using a variety of Common Lisp implementations. Currently, best 
performance is obtained with SBCL, |http : //www ■ sbcl . org| , thus in the following we assume this 
implementation. 
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and the non-optimal inner loop our solver seems to compare favourably with other 
programs, like GFUN, Guess. m and qGeneratingFunctions .m. 

8.2. computing modular images. Initially we used a very naive evaluation scheme 
to obtain the vector of truncated power series (or sequences) over Z^: we did the 
computation in characteristic and then computed remainders of division by p. 
Measurements showed that this approach used more than 98% of the execution 
time. Therefore, we switched to a faster scheme. 

For polynomials with integer coefficients we use a specialised routine to reduce 
coefficients modulo p. Also, for univariate polynomials with coefficients in Zp we 
use a specialised evaluation routine. However, for multivariate polynomials with 
modular coefficients we still use the naive method: we substitute for the variables 
in characteristic and then use the routine just mentioned to reduce coefficients 
modulo p. Of course this needlessly uses bignum arithmetic, but since multivariate 
evaluation is typically followed by several univariate evaluations the cost seem to 
be acceptable. 

Besides making the evaluation routines fast enough, it is important to generate 
the vector f of sequences or truncated power series efficiently. Initially we computed 
f in characteristic and then performed modular reduction. It turned out that 
doing the modular reduction only on the original sequence and computing the 
derived sequences only over Zp is much faster. 

Moreover, we remark that for truncated power series computing f involves com- 
puting many Cauchy products, which can be expensive. Therefore we implemented 
a simple optimiser, that takes a vector of monomials and detects common factors 
that can be cached. This reduced the number of Cauchy products that have to be 
computed significantly. 

With these improvements, the time needed for computing the modular images 
typically is comparable to the time needed for solving over Zp. 

8.3. checking solutions. For large problems, checking the solutions may be dom- 
inant factor. For sequences operations are performed pointwise, but for truncated 
power series we need polynomial multiplication which is much more expensive - 
the current polynomial routines of FriCAS are quadratic. Also, memory use may 
be a problem: we can solve the Hermite-Pade problem without actually comput- 
ing the series or sequences forming it (we only compute modular images), but we 
need the actual system to check the solutions. For example, we can guess an equa- 
tion for the generating function of Gessel walks (see the article by Manuel Kauers, 
Christoph Koutschan, and Doron Zeilberger about its holonomicity [18] and the 
article by Alin Bostan and Manuel Kauers about its algebraicity [7]) using a few 
hundred megabytes of memory, but explicitly storing the Hermite-Pade problem 
needs several gigabytes and we run out of memory on an 8 GB machine. 

We therefore introduced options, see Section SI that allow skipping the checks or 
to use a Monte-Carlo check. Although on small problems the Monte-Carlo check 
may be slower than the deterministic check, for the problem of Gessel walks it only 
takes about 30 sec. on a 2.4 GHz Core 2 and has moderate memory usage. 

There is an additional problem: our initial data are fractions. FriCAS fraction 
arithmetic simplifies fractions after each operation, which is quite costly (but also 
avoids intermediate expression swell which would be even more costly). In practise 
in many cases denominator is 1, and using fractions causes almost no extra cost. 
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But some problems really make use of fractional coefficients and for such systems 
checking is more costly. For linear recurrences we implemented a special purpose 
checking routine which computes the common denominator using smaller number 
of gcd operations and performs the rest of operations in fraction-free way. This 
routine also avoids storing the vector f of the Hermite-Pade problem and thus 
avoids problems with excessive memory usage. Unfortunately, for other problems 
checking is much more complicated so currently for them we use general purpose 
checking routine which uses standard fraction arithmetic. 

We also considered using a modular method for checking. Such a check would 
avoid problems with memory use. However, our a priori bounds on the number of 
evaluation points needed are so large, that the modular check is likely to be slower 
then our current version. Thus we only implemented a Monte-Carlo version of a 
modular check (currently only for the case of truncated power series) . 

9. Normalisation 

The output of the algorithm proposed by Bernhard Beckermann and George 
Labahn in 1994 2 is a so called a-basis (also known as order basis) for the solution 
space of the rational interpolation problem. 

In this section, we will call a polynomial vector a solution of the rational in- 
terpolation problem if it satisfies the order condition (ITU)) or (|lip . The degree 
constraints will be taken into account in a different manner, namely through the 
notion of defect: 

Definition 9.1. Let vector of degree bounds and p = 

[p^^\x), . . . ,p(™)(a;)] be a vector of polynomials. 

Then the defect of p is the (possibly negative) difference between degree bound 
and degree of p. More precisely: 

defect p — min In*-*-* — degp^*-* (a;) : i G {1, . . . , . 

For any 5 G Z, we denote by Cg the space of solutions with defect strictly larger 
than —6. 

A a -basis {pi , . . . , Pm} is a set of solutions of the rational interpolation problem, 
such that every solution q = [q^^^ (a;), . . . , g^™' (x)] can be written as a linear combi- 
nation q = ^r{x)pr, with defect q < defect Pr — deg ar{x), in a unique way. 
Note that the elements of the a-basis do not need to satisfy the degree bounds, i.e., 
their defect can be negative. 

An alternative description of cr-bases is given as follows, see Equation (6) 
and (7)]: 

(13) = spanjoi-'pr : 1 < r < m,j < defect p^ + S}, 
and 

m 

(14) dim Cg = ^ max(defect Pr — d,0). 

It follows that the set of defects is an invariant of the rational interpolation problem. 

We want to reconstruct a cr-basis over an integral domain i? (usually the integers 
or polynomials with integer coefficients) from several cr-bases over quotient rings 
i?//, where / is a prime ideal in i?. For this purpose it is crucial to know when 
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a tJ-basis over the field of fractions Frac{R/ 1) oi R/I is an image of the cr-basis 
over R. The difficulty is that for a given Hermite-Pade problem, the tr-basis is not 
uniquely determined. Fortunately, over a field we may obtain uniqueness using an 
appropriate normalisation. 

In the following, we define such a normalisation and show the existence and the 
uniqueness of a normalised a-basis. We then prove a statement that can be applied 
to detect 'bad reductions', i.e., to decide whether a given normalised cr-basis over 
Frac{R/I) is a modular image of a normalised cr-basis over Frac{R). 

Definition 9.2. Let p = \p^^^x), . . . ,p^"^\x)] be a polynomial vector, and n = 
[n^^\ ■ . ■ jn*^™)] a vector of degree bounds. The critical index of p is the minimal 
index of the vector where the defect is attained: 

criticalp = min |i : n*-*-* — degp'^^^x) — defectp,i G {1, . . . ,m}| . 

p is normalised if the leading coefficient of p^^\x) is 1, when i is the critical 
index of p. 

p is reduced with respect to another polynomial vector q = [q'^^\x), . . . , q^"^^x)], 
if at the critical index i of q we have degp*^*^(x) < deg (j^*^ (.t). A sequence of 
polynomial vectors {qi, . . . ,qrn} is reduced if q^ is reduced by q^ for all r ^ s. 
Note that this implies that the critical indices of the vectors must be all different. 

A sequence of polynomial vectors {qi, . . . , q_m} is sorted, if for r < s 

defect qr > defect qg or 

defect qr = defect qs and critical q,- < critical q,, . 

A sequence of polynomial vectors is normalised, if it is sorted, reduced, and all 
its elements are normalised. 

Lemma 9.3. Let R be an integral domain. If there is a normalised a-basis over R 
for the solution space of a given rational interpolation problem, then it is uniquely 
determined. 

Proof. Suppose we have two normalised bases P = {pi, . . . , p„i} and Q = {qi, . . . , q™}, 
and suppose that Pr- = qr for r < t. Assume without loss of generality that 
d := defect qt > defect pt. We will first show that the free module generated by the 
vectors in P with defect d coincides with the free module generated by the vectors 
in Q with defect d. 

Consider any q^ with defect q,. = d, and expand it in the u-basis P: q,- = 
X^rii (^s{x)Ps- We want to show that degas(x) = 0, i.e., as{x) G R for all s. 

Consider any s with ag ^ 0. Since P is a a-basis, we have d = defect q^ < 
defect Ps. Suppose that defect q^ < defect pg. If s > t then defect Ps < defect pt, 
since P is sorted, contradicting the assumption that d > defect pf. However, if 
s <t we have qs = Ps, and therefore that q,- is reduced with respect to pg. Thus, 
at the critical index i of p^ the degree of qr^ {x) is smaller than the degree of p'i^ {x), 
which contradicts as ^ 0. So we must have defect q^ = defect Ps, and therefore 
degas(a;) = 0. 

Similarly, expanding any p^ with defect d in the a-basis Q as Pr = Y^^=i Ps{x)<ls 

we obtain /3s{x) <E R: Q is a a-basis, so we have d = defect pr < defect qs. Suppose 
that defect pr < defect qs. Since Q is sorted we must have s <t and thus ps = qs- 
We then conclude as above that degps{x) = 0. 
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This shows that the free modules over R spanned by {prg, . . . , Pn} := {pr : 
defect Pr = d} and by {qs,,, . . . ,qsi} {qs : defect qs = d} coincide. Note that 
To = sq: because of defect p^o = d > defect pt we have rg < t. Thus, if we 
had So < To, then sq < t which impHes Ps„ = q^o, and finally defect Ps^ = d, 
a contradiction. Suppose now tq < sq, then tq < t, so q,.j, — p^^ and therefore 
defect q^o = d, again a contradiction. The number of vectors in both sequences 
must be the same, too, so ri = si. 

It remains to prove that the two polynomial sequences are in fact identical. If 
ro = ri, i.e., the module is one-dimensional, then the condition that p^f, and q^g 
are normalised implies that they are identical. 

If ro < ri, we first show that the critical index i of p^g is the same as the 
critical index j of q^. Since p^ is in the span of {qro, . ■ . ,qri}, there must also 
be a qr € {q^g, . . . , q^ } with degqr ^ > n^^^ — d = degpro- In fact, we have 
deg Qr''' = n^^'^ — d, since the defect of q^ is d. It follows that the critical index of 
qr is at most i, and, since {q^o, • ■ • , qri} is sorted, j < i. Interchanging the roles 
of {Pro I ■ • ■ ; Pri } and {q,.(j , . . . , qn }, we obtain i < j and therefore i = j. 

Thus also the modules spanned by {p^o+i, . . . , Pn } and {q^+i, . . . , qn } coin- 
cide: we can obtain both from the module spanned by {Pt-qj • ■ • i Pri} by restricting 
to the set of polynomial vectors v with deg u^*^ < n*^'^ — d. 

Iterating the argument of the previous two paragraphs, we find that the two 
modules generated by p^ and q^ coincide, and since these vectors are normalised, 
they must be identical. In particular, their critical indices coincide. Because the 
sequences {p^Q , • . . , Pn } and {q,.;, , . . . , q^ } are reduced, we can reuse the argument 
of the previous paragraph with i being this critical index, to obtain that also the 
modules spanned by {Pro, ■ ■ ■ , Pri-i} and {q^o, • • ■ , qr i-i} coincide. Now induction 
shows that p^ = qr for r € {ro, . . . , ri}, which concludes the proof of uniqueness. 

□ 

Lemma 9.4. The solution space of every rational interpolation problem has a nor- 
malised a-basis over a field. 

Proof. The existence of tr-bases over a field is meanwhile well known, for example 
Bernhard Beckermann and George Labahn prove that their algorithm produces a 
(T-basis in Let P = {pi, . . . , Pm} be a cr-basis. We will show that for any d we 
can replace {p,. : defect p,. > d} by an equivalent normalised sequence. We proceed 
by induction: assume that 5*0 = {pi, . . .pfc} is a normalised sequence of vectors 
with defect p,. > d, and let Si — {pfe+i, . . . , p;} is a sequence of vectors with defect 
d, which we will successively add to Sq. 

First of all we remark that replacing any vector q of a cr-basis P by q = q + ap, 
where p G P and defect p > defect q -I- deg a, again yields a cr-basis. Moreover, 
since by Equation the set of defects is an invariant of the solution space, we 
have defect q = defect q. 

Thus, by subtracting appropriate polynomial multiples of previous elements we 
may assume that each pt G is reduced with respect to Pr G So- Namely, let 
q G S*! , let Qo be set of critical indices of elements of 5*0 and let di = defect p^ with 
r such that i is critical index of p^. Consider Ci = ti*-*' — degg*^*'. If Ci > di for all 
i ^ Qo, then q is reduced with respect to all p^ G 5*0, r < io- Otherwise consider 
Qi = {i : Ci < di} and let Q2 be set of z G Qi such that Ci is minimal. Select the first 
element p G So, with critical index io G Q2- Then choose a, such that deg(g^*°^ — 
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Q,p('i))^ < dcg(jr(*''^ and replace q in S": by q = q — ap. Let bi = n*^'' —dcgq^^K Note 
that deg a = di^— Ci^ and (by the definition of defect) n^*^ — degp^*) > dig for all i, 
so deg{ap^^^) < n*^*^ — Cig. Moreover, if i is the critical index of p,. which is smaller 
than p, then the defect of p^ is bigger or equal to the defect of p, and, since p is 
reduced with respect to p^, wc have n^'' — degp^*' > dt^ and deg(ap^'^) < n^'^ — Cj^. 
Next degq')^*' < max ( degg(*\ deg(Q;p*^*'). Consequently, we have for all i 

bi = n^^ - deg q^'^ 

> n^'^ - max ( dog g*^*' , dcg(ap^*) )) 



W - deg , n(*) - deg(apW )) 



= min (ci, n^*-* — deg(ap^*^)) 
> min(Q,co), 

and for i corresponding to p^ less or equal to q we have bi > min(cj, Cq + 1). This 
means that after a finite number of reductions passing trough qo = q, qi = q, 
q2, etc. we will increase minjn^'^ — deggj*^) : i G Qi}- Continuing the reduction 

process we will increase {i € Qo : n^^^ — degqj^^ > di} and eventually qj will be 
reduced with respect to all elements of Sq. 

Note that every element Pr G So is automatically reduced with respect to every 
clement q G 5*1, because the defect of Pr is strictly smaller than the defect of q: 
let i be the critical index of Pr G ^o, then n^*) — degg^*^ > defect q > defect Pr = 
- deg pi*'. 

It remains to show how to ensure that p^ G Si is reduced with respect to Ps € Si. 
This can be achieved by applying a simpler version of process described above: in a 
first step reduce Pfe+2, • • • , P; with respect to Pk+i, then p^+a, • • • , p/ with respect 
to the sequence {pfc+i,Pfc+2} and so on. Note that, since the defects of these 
vectors are all equal, the degree of a will always be zero. This implies that the 
vector replacing Ps will remain reduced with respect to Pr, r < s. 

After this step, all Ps G Si are reduced with respect to p^ G with r < s. 
Similarly, but going backwards, we ensure that p^ G Si are reduced with respect 
to Pr G Si with r > s. □ 

To continue we need a lemma about linear systems. 

Lemma 9.5. Let ni = dim ker(^) p'rac(i?:) O'^d n2 = dimkeic{A) prac(R/ 1) ■ Then 
n-i < n2- Moreover, if ni = n2, then ]iCY{A)p,.ac{R/i) = ''i"(kcr(j4)/Vj-), where n is 
the quotient map and Ri is the localisation of R at I, i.e., the ring of fractions ^ 
with r € R and s G R\I. 

Proof. For vector spaces we have dimker(A) = dimdom(^) — rank(A), so ni < 

n2 is equivalent to TSLn'k{A)pj.ac(R) ^ rank(A)^,.„c(_R//). This inequality follows 
easily by considering minors of A. So it remains to prove that ni = n2 implies 
keT{A)FraciR/i) = 7r(ker(A)/j, . Let m = Ta.nk{A) Frac{R/ 1) ■ It is enough to prove 
this equality for surjective A over Frac{R). Namely permuting rows of A we can 
write A in block form: 

Ai 

A2 

such that Ai has m rows and Tank{Ai) pj.ac{R/ 1) = "t-- Since ni = n2 this means that 

also Ta.nk{Ai)Frac{R/I) = ^Sink{Ai) prac{R)- Next, ker{A)prac{R) C kev{Ai)prac{R), 
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and dimkci-{A)Frac(R} = dilRker{Ai)Frac(R}, so kei{A)Frac{R) = ^er{Ai)Frac{R)- 

Consequently, A2 = on 'keT{Ai)Frac{R), so also ker(yl)ijj = ker(Ai)flj-. Similarly, 
ker{A) Frac{R/i) — ^^''^{^i)Frac(R/i)- So, indeed it is enough to prove TT ( ker(Ai ) — 
keT{Ai)Frac{R/i) replacing A by Ai we may assume that A is surjective. By 
permuting columns of A we may assume that A has block form: 

( ^1 A2) 

where Ai is invertible over Frac{R/ 1). This means that determinant of Ai ^ J, so 
Ai is invertible over i?/. Multiplying from the left by A^^ we may assume that Ai 
is identity matrix. But then it is easy to compute the kernel over Rj: it consists 
of the vectors of the form {—A2V,v), where v is an arbitrary vector in the domain 
of A2. This implies that dim 7r(kcr(A)fl^ = dim]icr{ A) Frac{R/i)i which gives the 
claim. □ 

Corollary 9.6. Let P be a a-basis over Frac{R) and let Q be a a-basis over 
Frac{R/I). Assume that defect > defect p^+i and defect > defect q^+i for 
all r. Then defect Pr < defect for all r. If both P and Q are normalised and the 
defects and critical indices of Pr and qr are the same for all r, then P is defined 
over Ri and Q is an image ofP via the quotient map tt. 

Proof. According to Equation (dH), the dimensions of the solution spaces of 
the rational interpolation problem over Frac{R) and Frac{R/ 1) are given by 
Y^^=i max(defect Pr—S, 0), and J27=i inax(defect q^— (5, 0) respectively. By Lemma [931 
and choosing 5 G Z small enough, we obtain that defect p,- < defect q^ . 

When all defects are equal, the dimensions of the solution spaces coincide, so by 
the second part of the lemma, the solutions over Frac(R/ 1) are images of solutions 
over Rj. In particular, for each r the solution q^ is an image of a solution over 
Rj. By Equations p3|) and ^4]) . the sequence H — {hi, . . . , hm} is a cr-basis over 
Frac{R). It is easy to see that normalising H gives a cr-basis which is equal to 
H mod / (here we need the assumption on the critical indices). Since there is a 
unique normalised cr-basis, H mod / and Q coincide. □ 

Corollary 9.7. The previous corollary remains valid if we only assume that I is 
an intersection of prime ideals, and that a-basis computation and normalisation 
worked over Frac{R/ 1) without encountering division by non-invertible element. 

Proof. If / = /i n ■ • • n /„ then Frac{R/ 1) is isomorphic to a subring of product 
Il'^^^Frac{R/ li) and the claim easily follows. □ 

Let us stress that the corollary above means that either all modular images are 
'bad reductions' (that is the dimension of the space of modular solutions is bigger 
than the dimension of the original space, or not all critical indices are equal) which 
is highly improbable, or, by normalising and rejecting solutions with larger defect 
or different critical indices we obtain a consistent normalisation. 

10. Performance 

To test the performance of the package, we ran a few examples with our package, 
GFUN (version 3.5 on Maple 11), and Guess. m (version 0.32 on Mathematica 7.0). 

Timings are in seconds, best of three runs. Guess .m was run on a Intel Core 2 
E8400 ® 3 GHz with 6MB cache and 1.8GB RAM but running a 32-bit operating 
system, the other two on a Intel Pentium 4 3 GHz, 2MB cache, 1 GB RAM. 
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Both Guess and GFUN tried all configurations of order and degree, only Guess .m 
was run with specified order and degree of the recurrence. Since both GFUN and 
Guess. m look for homogeneous recurrences by default, we invoked guessPRec with 
homogeneous==true. We believe that neither GFUN not Guess. m check the recur- 
rence found, thus guessPRec was invoked with check==' skip. 

On the one hand, we recovered randomly generated homogeneous polynomial 
recurrences over Q from data, see Table [2] On the other hand, we computed 
homogeneous polynomial recurrences over Q[t\ for the first few integer powers of 
the Hermite polynomials, see Table[3] (This second test was only run against GFUN. 
For comparison, we also indicate order and degree of the recurrence discovered.) 

Readers should be cautious interpreting the data. Theoretically, the algorithm 
used by GFUN has lower complexity for large degrees, while Guess.m seems best 
adapted to very low degrees. However, as we explained performance depends very 
much on implementation detais and we lack sufficient information about the other 
packages to make a more general and precise statement. 

11. Further work 

To conclude, we would like to point out possible future directions: 

• It would be very important to generalise to the multidimensional case, as 
already implemented by Manuel Kauers in his package. Of course, we can 
employ 'diagonal guessing', see [36]. I.e., we could first guess formulas for 
each row, and then guess formulas for the coefficients of these. However, this 
approach is rather slow and, more importantly, depends on the availability 
of many terms. 

• The performance of guessExpRat and guessBinRat is very disappointing, 
making the two procedures nearly useless. Moreover, these two are but 
a toy example for real world applications, where one would like to guess 
formulas like 

VV i+J // ^ 2(4i + 2)!(4i + 3)! ^ (37i + 2)!i!(n-i)!(4n + 2i + 3)!!' 

as found by Omer Egecioglu, Timothy Redmond and Charles Ryavec [12] . 

• Maybe there are other interesting operators besides A„ and Qn that could 
be applied recursively to the sequence. Furthermore, there is a list of trans- 
formations used in The online encyclopedia of integer sequences, it might 
be rewarding to check which of those extend the class of functions already 
covered significantly. 
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order\degree 


10 


20 


30 


40 


50 


5 


0.0 
0.3 
0.1 


0.1 
1.1 
0.5 


0.3 

1.7 
1.8 


0.6 

4.3 
5.0 


1.1 
5.7 
11.3 


10 
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0.3 
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5.6 
2.4 
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10.8 
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29.2 
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27.8 
65.0 


15 


0.4 
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0.9 
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7.5 
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42.6 
33.3 


12.5 
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87.1 


24.5 
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196.5 


115.6 
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25 


2.2 
40.3 
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10.2 
85.3 

30.8 








30 


4.2 
75.8 
5.2 


18.7 
162.8 
50.1 








35 


7.3 
132.3 
7.7 


32.6 
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75.8 








40 


11.6 
221.0 
11.0 


51.8 
604.9 
120.0 








45 


17.8 
353.6 

15.1 
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906.5 

164.9 








50 
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20.0 
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26.0 
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60 


52.3 
1094.6 

38.6 


222.2 
2516.6 
362.3 








65 


70.3 
1509.0 

48.0 










70 


92.5 
1927.2 
58.5 











Table 2. Guessing random homogeneous recurrences with poly- 
nomial coefficients over Q. The first line is Guess, the second GFUN, 
the third Guess.m. 
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H{.,tf 




order 


3 


4 


5 


6 


7 


8 


degree 


1 


3 


7 


13 


22 


34 


Guess 


0.0 


0.0 


0.0 


0.4 


5.1 


46.2 


GFUN 


0.0 


0.1 


2.5 


20.2 


238.3 


fail 



Table 3. Guessing random homogeneous recurrences for powers 
of the Hermite polynomials. (GFUN ran out of memory computing 
the last entry.) 
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